Robust Optical Recognition of Cursive Pashto Script Using Scale, Rotation and Location Invariant Approach
نویسندگان
چکیده
The presence of a large number of unique shapes called ligatures in cursive languages, along with variations due to scaling, orientation and location provides one of the most challenging pattern recognition problems. Recognition of the large number of ligatures is often a complicated task in oriental languages such as Pashto, Urdu, Persian and Arabic. Research on cursive script recognition often ignores the fact that scaling, orientation, location and font variations are common in printed cursive text. Therefore, these variations are not included in image databases and in experimental evaluations. This research uncovers challenges faced by Arabic cursive script recognition in a holistic framework by considering Pashto as a test case, because Pashto language has larger alphabet set than Arabic, Persian and Urdu. A database containing 8000 images of 1000 unique ligatures having scaling, orientation and location variations is introduced. In this article, a feature space based on scale invariant feature transform (SIFT) along with a segmentation framework has been proposed for overcoming the above mentioned challenges. The experimental results show a significantly improved performance of proposed scheme over traditional feature extraction techniques such as principal component analysis (PCA).
منابع مشابه
Feature Extraction Using Zernike Moments
Shape identification and feature extraction are the main concern of any pattern recognition system. Object parameters are mostly dependent on spatio-temporal relationships among the pixels. However feature extraction is a complex phenomenon which needs to be addressed from the invariance property, irrespective of position and orientation. Zernike moments are used as shape descriptors and identi...
متن کاملSemi-Automated Transcription Generation for Pashto Cursive Script
Usually, a large amount of transcription data is required for training and benchmarking Optical Character Recognition (OCR) systems for new scripts like Pashto. In case of real image data; mostly the images are acquired through scanning. For supervised training scenarios, it is required to have a ground truth against the corresponding scanned images. Usually, the ground truth is created by tran...
متن کاملOffline Handwritten MODI Character Recognition Using HU, Zernike Moments and Zoning
HOCR is abbreviated as Handwritten Optical Character Recognition. HOCR is a process of recognition of different handwritten characters from a digital image of documents. Handwritten automatic character recognition has attracted many researchers all over the world to contribute handwritten character recognition domain. Shape identification and feature extraction is very important part of any cha...
متن کاملCursive Script Postal Address Recognition Abstract Cursive Script Postal Address Recognition
Cursive Script Postal Address Recognition By Prasun Sinha Large variations in writing styles and di culty in segmenting cursive words are the main reasons for cursive script postal address recognition being a challenging task A scheme for locating and recognizing words based on over segmentation followed by dynamic programming is proposed This technique is being used for zip code extraction as ...
متن کاملA New Approach to Segmentation of Persian Cursive Script based on Adjustment the Fragments
Optical Character Recognition (OCR) is a very old and of great interest in pattern recognition field. The recognition of cursive scripts like Persian and Arabic languages is a difficult task as their segmentation suffers from serious problems in different languages. Segmentation is a process of dividing cursive words into smaller parts in order to decrease complexity and increase accuracy of re...
متن کامل